When looking at the time course data, we noticed an interesting pattern: it seemed as though it would be possible to predict an individual’s group from the difference between their load effect during encoding and their load effect during delay. It seemed as though low capacity subjects had small load effects at both encoding and delay, medium capacity subjects had large load effects at both encoding and delay and high capacity subjects had large load effects at encoding but small ones at delay. As such, we wanted to create a model that used this information to predict span.
To do so, we extracted data from TR 6 for the encoding period and TR 8 for the delay period. This represented 1 TR’s worth of data in the middle of where we’d expect activity from for each period (from our model of the task convolved with a hemodynamic delay function) and where we see maximal differences between group. In addition to the raw load effects, we created two composite variables that we expected to capture the differences between groups. The first was simply the difference between load effects at encoding and delay, and the second was the sum of that difference and the load effect effect at encoding. These measures allow us to create a series of regression models to predict span, BPRS total score and accuracy at high load.
In addition to the univariate load effects, we have also seen a relationship with various measures reflecting multivariate representation across and within trial, including the probability of a MVPA classifer predicting a face at any given point in a trial and the similarity of multivariate representations across and within trials. Both of these measures were taken at each individual trial and averaged, in addition to taken from the average over many trials (which served as a template for the canonical trial type). We added these multivariate measures to see if they improved model performance when added to regression models that only contained variables based on univariate load effects.
Finally, we added in structural measures, resting state measures and EEG measures to try to explain variance at multiple different levels/modalities.
Overall, we were best able to explain variance in high load accuracy, with our best model able to explain 26% of the variance. In contrast, we were only able to explain 21% of the variance in span, and 12% of the variance in BPRS scores.
The individual components related to each of these measures will be discussed further below, but one interesting thing to note is that there is largely not much overlap. There is some overlap in the EEG measures between BPRS and span, but no overlap between span and accuracy or accuracy and BPRS. Another interesting thing to note is that when comparing PCs related to accuracy and span, those related to accuracy seem to be very much driven by the encoding and measures related to visual processing, while PCs related to span include more connectivity/multivariate measures and are also driven by delay period, or processing in DFR regions.
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.1
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(rmatio)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(patchwork)
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:patchwork':
##
## area
## The following object is masked from 'package:dplyr':
##
## select
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(glmnet)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loaded glmnet 4.0-2
library(tidymodels)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidymodels 0.1.1 ──
## ✓ broom 0.7.0 ✓ recipes 0.1.13
## ✓ dials 0.0.8 ✓ rsample 0.0.7
## ✓ infer 0.5.3 ✓ tune 0.1.1
## ✓ modeldata 0.0.2 ✓ workflows 0.1.3
## ✓ parsnip 0.1.3 ✓ yardstick 0.0.7
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
## x psych::%+%() masks ggplot2::%+%()
## x scales::alpha() masks psych::alpha(), ggplot2::alpha()
## x scales::discard() masks purrr::discard()
## x Matrix::expand() masks tidyr::expand()
## x dplyr::filter() masks stats::filter()
## x recipes::fixed() masks stringr::fixed()
## x dplyr::lag() masks stats::lag()
## x caret::lift() masks purrr::lift()
## x Matrix::pack() masks tidyr::pack()
## x yardstick::precision() masks caret::precision()
## x yardstick::recall() masks caret::recall()
## x MASS::select() masks dplyr::select()
## x yardstick::sensitivity() masks caret::sensitivity()
## x yardstick::spec() masks readr::spec()
## x yardstick::specificity() masks caret::specificity()
## x recipes::step() masks stats::step()
## x Matrix::unpack() masks tidyr::unpack()
load('data/behav.RData')
load('data/structural_measures.RData')
FA_Data <- read.csv("data/RDoC_DTI_FA_JHU_ICBM.csv")
load('data/EEG_for_PCs.RData')
load('data/connectivity_data.RData')
load('data/load_effects_DFR.RData')
load('data/split_groups_info.RData')
load('data/ITC_fusiform.RData')
similarity_fusiform <- similarity_temp
load('data/ITC_DFR_delay.RData')
similarity_DFR <- similarity_temp
load("data/MVPA_fusiform.RData")
individual_trial_probs_fusiform <- individual_trial_averages_probs
averages_from_template_fusiform <- averages_from_template
load("data/MVPA_DFR_delay_mask.RData")
individual_trial_probs_DFR <- individual_trial_averages_probs
averages_from_template_DFR <- averages_from_template
load("data/MVPA_HPC.RData")
individual_trial_probs_HPC <- individual_trial_averages_probs
averages_from_template_HPC <- averages_from_template
DFR_ROIs <- c("L aMFG","L dlPFC","L dMFG","L IPS","L preSMA","R dlPFC","R IPS","R medParietal","all delay ROIs")
source("helper_fxns/select_period_average.R")
source("helper_fxns/calc_r_squared.R")
First, we’re going to load in the time courses of the activity from the DFR regions. We’re doing this a little differently from last time, because we don’t want the interpolated data, we just want the one with 14 TRs.
We’re going to remove the subjects that aren’t included in the split group analysis, and put the data in a slightly easier to use format.
temp <- read.mat('data/RSA_DFR_trials.mat')
factors <- matrix(nrow=170)
for (idx in seq.int(1,170)){
if (constructs_fMRI$PTID[idx] %in% WM_groups[["high"]]$PTID){
factors[idx] <- "high"
}else if (constructs_fMRI$PTID[idx] %in% WM_groups[["med"]]$PTID){
factors[idx] <- "med"
}else if (constructs_fMRI$PTID[idx] %in% WM_groups[["low"]]$PTID){
factors[idx] <- "low"
}else{
factors[idx] <- "not_incl"
}
}
factors <- factor(factors, levels=c("low","med","high","not_incl"))
trial_activity_high <- temp[["trial_avg_activity_high"]]
trial_activity_low <- temp[["trial_avg_activity_low"]]
all_data <- list(subjs=constructs_fMRI$PTID,
WMC = factors,
L_aMFG = list(
high = data.frame(trial_activity_high[,,1,1]),
low = data.frame(trial_activity_low[,,1,1])
),
L_dlPFC = list(
high = data.frame(trial_activity_high[,,1,2]),
low = data.frame(trial_activity_low[,,1,2])
),
L_dMFG = list(
high = data.frame(trial_activity_high[,,1,3]),
low = data.frame(trial_activity_low[,,1,3])
),
L_IPS = list(
high = data.frame(trial_activity_high[,,1,4]),
low = data.frame(trial_activity_low[,,1,4])
),
L_preSMA = list(
high = data.frame(trial_activity_high[,,1,5]),
low = data.frame(trial_activity_low[,,1,5])
),
R_dlPFC = list(
high = data.frame(trial_activity_high[,,1,6]),
low = data.frame(trial_activity_low[,,1,6])
),
R_IPS = list(
high = data.frame(trial_activity_high[,,1,7]),
low = data.frame(trial_activity_low[,,1,7])
),
R_medParietal = list(
high = data.frame(trial_activity_high[,,1,8]),
low = data.frame(trial_activity_low[,,1,8])
),
all_delay = list(
high = data.frame(trial_activity_high[,,1,9]),
low = data.frame(trial_activity_low[,,1,9])
)
)
Now, we’re going to calculate load effects for each ROI at every time point.
for (ROI in seq.int(3,11)){
all_data[[ROI]][["load_effect"]] <- all_data[[ROI]][["high"]]-all_data[[ROI]][["low"]]
temp <- data.frame(group = all_data[["WMC"]], all_data[[ROI]][["load_effect"]])
all_data[[ROI]][["SVM"]] <- temp
}
Now, let’s pull out the encoding and delay values - we’re also going to calculate the difference between encoding and delay, and encoding + (encoding-delay), because we think that this is going to be useful. We’re going to include all these measures, plus span, in a convenient dataframe.
for (ROI in seq.int(3,11)){
temp <- data.frame(matrix(nrow=170, ncol=6))
colnames(temp) <- c("group","encoding","delay","encoding_delay","encoding_delay_comb" ,"span")
temp$group <- all_data[["WMC"]]
temp$span <- constructs_fMRI$omnibus_span_no_DFR_MRI
temp$encoding <- all_data[[ROI]][["load_effect"]][,6]
#temp$delay <- rowMeans(all_data[[ROI]][["load_effect"]][,8:9])
temp$delay <- all_data[[ROI]][["load_effect"]][,8]
temp$encoding_delay <- temp$encoding - temp$delay
temp$encoding_delay_comb <- temp$encoding + temp$encoding_delay
temp$high_acc <- p200_data$XDFR_MRI_ACC_L3[p200_data$PTID %in% constructs_fMRI$PTID]
temp$BPRS <- p200_clinical_zscores$BPRS_TOT[p200_clinical_zscores$PTID %in% constructs_fMRI$PTID]
all_data[[ROI]][["SVM_2"]] <- temp
}
All regions except for L aMFG have significant relationships
ROI_plot_list <- list()
for (ROI in seq.int(3,11)){
correlation_test <- cor.test(all_data[[ROI]][["SVM_2"]]$span,all_data[[ROI]][["SVM_2"]]$encoding_delay)
ROI_plot_list[[DFR_ROIs[ROI-2]]] <- ggplot(data=all_data[[ROI]][["SVM_2"]])+
geom_point(aes(x=span,y=encoding_delay_comb,color=group))+
stat_smooth(aes(x=span,y=encoding_delay_comb),method="lm",color="black")+
ggtitle(paste(DFR_ROIs[ROI-2], ", r = ",round(correlation_test$estimate,digits=3)))+
xlab("WM Span")+
ylab("LE Difference")+
theme_classic()
}
(ROI_plot_list[[1]] + ROI_plot_list[[2]]) /
(ROI_plot_list[[3]] + ROI_plot_list[[4]]) +
plot_annotation(title = "Correlation between (encoding LE + encoding/delay LE difference) and span ")+
plot_layout(guides="collect")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
(ROI_plot_list[[5]] + ROI_plot_list[[6]]) /
(ROI_plot_list[[7]] + ROI_plot_list[[8]]) +
plot_layout(guides="collect")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
ROI_plot_list[[9]]
## `geom_smooth()` using formula 'y ~ x'
data_for_reg <- data.frame(span = all_data[["all_delay"]][["SVM_2"]]$span,
high_acc = all_data[["all_delay"]][["SVM_2"]]$high_acc,
BPRS = all_data[["all_delay"]][["SVM_2"]]$BPRS,
L_aMFG_enc = all_data[[3]][["SVM_2"]]$encoding,
L_aMFG_delay = all_data[[3]][["SVM_2"]]$delay,
L_dlPFC_enc = all_data[[4]][["SVM_2"]]$encoding,
L_dlPFC_delay = all_data[[4]][["SVM_2"]]$delay,
L_dMFG_enc = all_data[[5]][["SVM_2"]]$encoding,
L_dMFG_delay = all_data[[5]][["SVM_2"]]$delay,
L_IPS_enc = all_data[[6]][["SVM_2"]]$encoding,
L_IPS_delay = all_data[[6]][["SVM_2"]]$delay,
L_preSMA_enc = all_data[[7]][["SVM_2"]]$encoding,
L_preSMA_delay = all_data[[7]][["SVM_2"]]$delay,
R_dlPFC_enc = all_data[[8]][["SVM_2"]]$encoding,
R_dlPFC_delay = all_data[[8]][["SVM_2"]]$delay,
R_IPS_enc = all_data[[9]][["SVM_2"]]$encoding,
R_IPS_delay = all_data[[9]][["SVM_2"]]$delay,
R_medPar_enc = all_data[[10]][["SVM_2"]]$encoding,
R_medPar_delay = all_data[[10]][["SVM_2"]]$delay,
all_delay_enc = all_data[[11]][["SVM_2"]]$encoding,
all_delay_delay = all_data[[11]][["SVM_2"]]$delay,
L_aMFG_enc_delay = all_data[[3]][["SVM_2"]]$encoding_delay,
L_dlPFC_enc_delay = all_data[[4]][["SVM_2"]]$encoding_delay,
L_dMFG_enc_delay = all_data[[5]][["SVM_2"]]$encoding_delay,
L_IPS_enc_delay = all_data[[6]][["SVM_2"]]$encoding_delay,
L_preSMA_enc_delay = all_data[[7]][["SVM_2"]]$encoding_delay,
R_dlPFC_enc_delay = all_data[[8]][["SVM_2"]]$encoding_delay,
R_IPS_enc_delay = all_data[[9]][["SVM_2"]]$encoding_delay,
R_medPar_enc_delay = all_data[[10]][["SVM_2"]]$encoding_delay,
L_aMFG_enc_delay_comb = all_data[[3]][["SVM_2"]]$encoding_delay_comb,
L_dlPFC_enc_delay_comb = all_data[[4]][["SVM_2"]]$encoding_delay_comb,
L_dMFG_enc_delay_comb = all_data[[5]][["SVM_2"]]$encoding_delay_comb,
L_IPS_enc_delay_comb = all_data[[6]][["SVM_2"]]$encoding_delay_comb,
L_preSMA_enc_delay_comb = all_data[[7]][["SVM_2"]]$encoding_delay_comb,
R_dlPFC_enc_delay_comb = all_data[[8]][["SVM_2"]]$encoding_delay_comb,
R_IPS_enc_delay_comb = all_data[[9]][["SVM_2"]]$encoding_delay_comb,
R_medPar_enc_delay_comb = all_data[[10]][["SVM_2"]]$encoding_delay_comb,
R_all_delay_enc_delay_comb = all_data[[11]][["SVM_2"]]$encoding_delay_comb,
high_corr_fus_enc = similarity_fusiform[["high_correct_avg"]]$X6,
high_corr_fus_delay = similarity_fusiform[["high_correct_avg"]]$X8,
high_incorr_fus_enc = similarity_fusiform[["high_incorrect_avg"]]$X6,
high_incorr_fus_del = similarity_fusiform[["high_incorrect_avg"]]$X8,
low_corr_fus_enc = similarity_fusiform[["low_correct_avg"]]$X6,
low_corr_fus_del = similarity_fusiform[["low_correct_avg"]]$X8,
correct_encoding_to_correct_delay_fus = similarity_fusiform[["correct_encoding_to_correct_delay"]],
correct_enc_to_delay_fus_high_corr = similarity_fusiform[["correct_encoding_to_delay_avg"]][,4],
enc_to_correct_delay_fus_high_corr = similarity_fusiform[["encoding_to_correct_delay_avg"]][,4],
high_corr_DFR_enc = similarity_DFR[["high_correct_avg"]]$X6,
high_corr_DFR_delay = similarity_DFR[["high_correct_avg"]]$X8,
high_incorr_DFR_enc = similarity_DFR[["high_incorrect_avg"]]$X6,
high_incorr_DFR_del = similarity_DFR[["high_incorrect_avg"]]$X8,
low_corr_DFR_enc = similarity_DFR[["low_correct_avg"]]$X6,
low_corr_DFR_del = similarity_DFR[["low_correct_avg"]]$X8,
correct_encoding_to_correct_delay_DFR = similarity_DFR[["correct_encoding_to_correct_delay"]],
correct_enc_to_delay_DFR_high_corr = similarity_DFR[["correct_encoding_to_delay_avg"]][,4],
enc_to_correct_delay_DFR_high_corr = similarity_DFR[["encoding_to_correct_delay_avg"]][,4],
indiv_trial_avgs_high_correct_fusiform_enc = individual_trial_probs_fusiform[["high_correct"]]$V6,
indiv_trial_avgs_high_correct_fusiform_delay = individual_trial_probs_fusiform[["high_correct"]]$V8,
indiv_trial_avgs_high_correct_fusiform_probe = individual_trial_probs_fusiform[["high_correct"]]$V11,
indiv_trial_avgs_high_correct_DFR_enc = individual_trial_probs_DFR[["high_correct"]]$V6,
indiv_trial_avgs_high_correct_DFR_delay = individual_trial_probs_DFR[["high_correct"]]$V8,
indiv_trial_avgs_high_correct_DFR_probe = individual_trial_probs_DFR[["high_correct"]]$V11,
indiv_trial_avgs_high_incorrect_fusiform_enc = individual_trial_probs_fusiform[["high_incorrect"]]$V6,
indiv_trial_avgs_high_incorrect_fusiform_delay = individual_trial_probs_fusiform[["high_incorrect"]]$V8,
indiv_trial_avgs_high_incorrect_fusiform_probe = individual_trial_probs_fusiform[["high_incorrect"]]$V11,
indiv_trial_avgs_high_incorrect_DFR_enc = individual_trial_probs_DFR[["high_incorrect"]]$V6,
indiv_trial_avgs_high_incorrect_DFR_delay = individual_trial_probs_DFR[["high_incorrect"]]$V8,
indiv_trial_avgs_high_incorrect_DFR_probe = individual_trial_probs_DFR[["high_incorrect"]]$V11,
indiv_trial_avgs_low_correct_fusiform_enc = individual_trial_probs_fusiform[["low_correct"]]$V6,
indiv_trial_avgs_low_correct_fusiform_delay = individual_trial_probs_fusiform[["low_correct"]]$V8,
indiv_trial_avgs_low_correct_fusiform_probe = individual_trial_probs_fusiform[["low_correct"]]$V11,
indiv_trial_avgs_low_correct_DFR_enc = individual_trial_probs_DFR[["low_correct"]]$V6,
indiv_trial_avgs_low_correct_DFR_delay = individual_trial_probs_DFR[["low_correct"]]$V8,
indiv_trial_avgs_low_correct_DFR_probe = individual_trial_probs_DFR[["low_correct"]]$V11,
averages_from_template_high_correct_fusiform_enc = averages_from_template_fusiform[["high_correct"]]$V6,
averages_from_template_high_correct_fusiform_delay = averages_from_template_fusiform[["high_correct"]]$V8,
averages_from_template_high_correct_fusiform_probe = averages_from_template_fusiform[["high_correct"]]$V11,
averages_from_template_high_correct_DFR_enc = averages_from_template_DFR[["high_correct"]]$V6,
averages_from_template_high_correct_DFR_delay = averages_from_template_DFR[["high_correct"]]$V8,
averages_from_template_high_correct_DFR_probe = averages_from_template_DFR[["high_correct"]]$V11,
averages_from_template_high_incorrect_fusiform_enc = averages_from_template_fusiform[["high_incorrect"]]$V6,
averages_from_template_high_incorrect_fusiform_delay = averages_from_template_fusiform[["high_incorrect"]]$V8,
averages_from_template_high_incorrect_fusiform_probe = averages_from_template_fusiform[["high_incorrect"]]$V11,
averages_from_template_high_incorrect_DFR_enc = averages_from_template_DFR[["high_incorrect"]]$V6,
averages_from_template_high_incorrect_DFR_delay = averages_from_template_DFR[["high_incorrect"]]$V8,
averages_from_template_high_incorrect_DFR_probe = averages_from_template_DFR[["high_incorrect"]]$V11,
averages_from_template_low_correct_fusiform_enc = averages_from_template_fusiform[["low_correct"]]$V6,
averages_from_template_low_correct_fusiform_delay = averages_from_template_fusiform[["low_correct"]]$V8,
averages_from_template_low_correct_fusiform_probe = averages_from_template_fusiform[["low_correct"]]$V11,
averages_from_template_low_correct_DFR_enc = averages_from_template_DFR[["low_correct"]]$V6,
averages_from_template_low_correct_DFR_delay = averages_from_template_DFR[["low_correct"]]$V8,
averages_from_template_low_correct_DFR_probe = averages_from_template_DFR[["low_correct"]]$V11,
PTID = constructs_fMRI$PTID
)
data_for_reg <- merge(data_for_reg, p200_FFA, all = TRUE)
data_for_reg <- data_for_reg[,c(2:111,1)]
name_list <- c("alpha_cue_Favg", "alpha_cue_Oz", "alpha_delay_Favg", "alpha_delay_Oz", "alpha_probe_Favg", "alpha_probe_Oz", "beta_cue_Favg", "beta_cue_Oz", "beta_delay_Favg", "beta_delay_Oz", "beta_probe_Favg", "beta_probe_Oz", "theta_cue_Favg", "theta_cue_Oz", "theta_delay_Favg", "theta_delay_Oz", "theta_probe_Favg", "theta_probe_Oz", "low_gamma_cue_Favg", "low_gamma_cue_Oz", "low_gamma_delay_Favg", "low_gamma_delay_Oz", "low_gamma_probe_Favg", "low_gamma_probe_Oz", "cue_O_n170", "probe_O_n170", "cue_P3", "probe_P3")
EEG_list <- list(alpha_cue_average_Favg, alpha_cue_average_Oz,alpha_delay_average_Favg, alpha_delay_average_Oz, alpha_probe_average_Favg, alpha_probe_average_Oz,beta_cue_average_Favg, beta_cue_average_Oz, beta_delay_average_Favg, beta_delay_average_Oz, beta_probe_average_Favg, beta_probe_average_Oz,theta_cue_average_Favg, theta_cue_average_Oz, theta_delay_average_Favg, theta_delay_average_Oz, theta_probe_average_Favg, theta_probe_average_Oz,low_gamma_cue_average_Favg, low_gamma_cue_average_Oz,low_gamma_delay_average_Favg, low_gamma_delay_average_Oz, low_gamma_probe_average_Favg, low_gamma_probe_average_Oz,cue_average_O_n170, probe_average_O_n170, cue_average_P3, probe_average_P3 )
for (idx in seq.int(1,length(EEG_list))){
colnames(EEG_list[[idx]])[2:3] <- paste0(name_list[idx],"_",colnames(EEG_list[[idx]][2:3]),sep="")
EEG_list[[idx]] <- EEG_list[[idx]][1:3]
}
EEG_df <- Reduce(function(x,y) merge(x = x, y = y, all=TRUE), EEG_list)
EEG_df <- merge(EEG_df, CDA[,1:10], all.x=TRUE, by.x="PTID", by.y="subID")
colnames(aparc_LH_MTHICK)[2:37] <- paste0(colnames(aparc_LH_MTHICK)[2:37],"_LH", sep="")
colnames(aparc_RH_MTHICK)[2:37] <- paste0(colnames(aparc_RH_MTHICK)[2:37],"_RH", sep="")
colnames(FA_Data)[1] <- "ID"
struc_df <- Reduce(function(x,y) merge(x = x, y = y, by="ID"),
list(aparc_LH_MTHICK, aparc_RH_MTHICK,aseg))
# remove STD from FA
FA_Data <- FA_Data[,1:21]
FA_Data <- data.frame(FA_Data)
RS_df <- Reduce(function(x,y) merge(x=x, y=y, by = "PTID"),
list(p200_BCT_forCorr,p200_indiv_network_ParticCoeff, p200_all_RS))
RS_df <- data.frame(RS_df)
From these plots, we can see that our measures are high correlated.
pairs.panels(data_for_reg[,c(4,6,8,10,12,14,16,18)], density=TRUE)
pairs.panels(data_for_reg[,c(5,7,9,11,13,15,17,19)], density=TRUE)
pairs.panels(data_for_reg[,c(22:29)], density= TRUE)
pairs.panels(data_for_reg[,c(39:44,49:53)])
pairs.panels(data_for_reg[,c(45:47,53:56)])
pairs.panels(data_for_reg[,c(57:59,63:65,69:71)])
pairs.panels(data_for_reg[,c(60:62,66:68,72:74)])
pairs.panels(data_for_reg[,c(75:77,81:83,87:89)])
pairs.panels(data_for_reg[,c(78:80,84:86,90:92)])
We know that our variables are highly correlated, so to deal with multi-collinearity, we’re going to try to run a PCA on all the encoding load effects, delay load effects and encoding - delay. It looks like the first 3 dimensions carry most of the variance (77%), so we’re going to stick with those three.
PC1: encoding load effects and the combined encoding+encoding-delay in the PFC regions and load effect during delay PC2: delay load effect PC3: encoding - delay in parietal regions and fusiform activity during delay
res.pca <- prcomp(data_for_reg[!is.na(data_for_reg$L_CUE_LE),c(4:19, 22:37, 106:110)], scale = TRUE)
fviz_eig(res.pca)
summary(res.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.8317 2.8690 1.7755 1.36044 1.16149 1.08876 0.99447
## Proportion of Variance 0.3968 0.2225 0.0852 0.05002 0.03646 0.03204 0.02673
## Cumulative Proportion 0.3968 0.6193 0.7045 0.75450 0.79096 0.82300 0.84973
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.90027 0.80958 0.78333 0.76012 0.71656 0.65841 0.61265
## Proportion of Variance 0.02191 0.01771 0.01658 0.01562 0.01388 0.01172 0.01014
## Cumulative Proportion 0.87163 0.88935 0.90593 0.92155 0.93542 0.94714 0.95729
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.54920 0.53414 0.52176 0.47660 0.45344 0.3894 0.3700
## Proportion of Variance 0.00815 0.00771 0.00736 0.00614 0.00556 0.0041 0.0037
## Cumulative Proportion 0.96544 0.97315 0.98051 0.98664 0.99220 0.9963 1.0000
## PC22 PC23 PC24 PC25 PC26
## Standard deviation 1.001e-15 3.286e-16 3.286e-16 3.286e-16 3.286e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC27 PC28 PC29 PC30 PC31
## Standard deviation 3.286e-16 3.286e-16 3.286e-16 3.286e-16 3.286e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC32 PC33 PC34 PC35 PC36
## Standard deviation 3.286e-16 3.286e-16 3.286e-16 3.286e-16 3.286e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC37
## Standard deviation 2.265e-16
## Proportion of Variance 0.000e+00
## Cumulative Proportion 1.000e+00
res.var <- get_pca_var(res.pca)
res.var$contrib # Contributions to the PCs
## Dim.1 Dim.2 Dim.3 Dim.4
## L_aMFG_enc 5.28856171 0.37902778 6.157961e-01 0.030928862
## L_aMFG_delay 0.87514835 7.50434194 6.055440e-01 0.039678359
## L_dlPFC_enc 5.06817204 0.14920941 1.474970e+00 0.853601769
## L_dlPFC_delay 0.90616054 7.41643706 3.362529e-02 0.518352425
## L_dMFG_enc 3.85523406 0.83486192 7.684441e-01 0.104593574
## L_dMFG_delay 0.13088997 7.22260516 3.919021e-02 1.214790066
## L_IPS_enc 4.72681474 1.17282759 7.032161e-01 0.240384223
## L_IPS_delay 1.09744544 7.85264082 9.672027e-02 0.195400514
## L_preSMA_enc 5.05333421 0.26759021 1.130414e+00 1.019022415
## L_preSMA_delay 0.89809066 6.88015495 5.319551e-01 0.008823488
## R_dlPFC_enc 4.73935292 0.46232139 6.660689e-01 1.053421889
## R_dlPFC_delay 0.49515939 7.85774530 8.012236e-04 2.625527821
## R_IPS_enc 4.66486401 0.83847976 2.642182e+00 1.514621778
## R_IPS_delay 1.36497712 7.09776775 2.901157e-01 0.292800119
## R_medPar_enc 3.31551649 0.76575484 6.344154e+00 2.466090678
## R_medPar_delay 0.96321486 5.64743173 1.919158e-01 2.370647978
## L_aMFG_enc_delay 2.36929567 4.42896532 2.672597e+00 0.153234907
## L_dlPFC_enc_delay 2.59163366 4.78890802 2.369753e+00 0.118092733
## L_dMFG_enc_delay 2.48207674 3.53125606 1.158124e+00 0.675105861
## L_IPS_enc_delay 2.61058199 3.35530941 2.102327e+00 1.319237730
## L_preSMA_enc_delay 2.80635222 3.28843831 3.493193e+00 1.114055576
## R_dlPFC_enc_delay 3.06685615 3.87068237 7.844988e-01 0.183640832
## R_IPS_enc_delay 1.78102740 3.71978587 6.944083e+00 0.816801621
## R_medPar_enc_delay 1.10671832 2.64765832 1.201193e+01 0.017176427
## L_aMFG_enc_delay_comb 4.88816300 0.62517072 1.860527e+00 0.102057952
## L_dlPFC_enc_delay_comb 4.76316585 0.85743222 2.345169e+00 0.530553472
## L_dMFG_enc_delay_comb 4.28945870 0.32556806 1.307867e+00 0.086371746
## L_IPS_enc_delay_comb 4.87986967 0.04436567 1.595501e+00 0.788727952
## L_preSMA_enc_delay_comb 4.77915244 0.40562367 2.534902e+00 1.297075970
## R_dlPFC_enc_delay_comb 4.92923812 0.40923925 9.098191e-01 0.145731304
## R_IPS_enc_delay_comb 4.29112476 0.16765401 5.764064e+00 1.573412492
## R_medPar_enc_delay_comb 2.90033053 0.09957716 1.172958e+01 1.115201299
## L_DELAY_LE 0.05009231 2.05525124 8.483684e+00 0.054992419
## L_PROBE_LE 0.34549630 0.26018331 1.996965e+00 36.768046879
## R_CUE_LE 1.28086178 0.12786195 2.991150e+00 4.079700415
## R_DELAY_LE 0.01481527 2.47568695 7.542708e+00 0.200204539
## R_PROBE_LE 0.33075262 0.16618447 3.266446e+00 34.311891921
## Dim.5 Dim.6 Dim.7 Dim.8
## L_aMFG_enc 7.235748e-01 3.333941e-01 0.076352007 1.232438e-01
## L_aMFG_delay 3.509194e+00 3.505166e+00 0.191677299 8.052987e-01
## L_dlPFC_enc 2.247951e+00 1.214441e+00 0.178245671 6.168010e-01
## L_dlPFC_delay 3.096310e+00 1.564725e+00 0.041271626 3.638735e-01
## L_dMFG_enc 7.555407e-01 1.962421e+00 21.386992431 2.233940e-04
## L_dMFG_delay 5.663422e-02 7.538875e+00 0.880083005 2.609825e-01
## L_IPS_enc 2.746733e+00 2.718373e+00 0.903965109 3.606078e+00
## L_IPS_delay 1.107531e-01 8.913786e-04 1.010416142 2.087395e-02
## L_preSMA_enc 3.439487e+00 1.812855e+00 0.431125808 1.366172e+00
## L_preSMA_delay 4.472804e+00 1.024015e+00 0.030826740 1.029907e+00
## R_dlPFC_enc 1.553245e-01 4.610671e+00 6.130948222 4.560212e-02
## R_dlPFC_delay 1.955810e-01 6.977731e-01 3.109724574 2.896851e-01
## R_IPS_enc 2.813481e+00 3.283540e-02 1.481955541 6.846243e-02
## R_IPS_delay 2.573756e-02 3.684334e+00 2.006973302 1.436737e+00
## R_medPar_enc 2.645726e-02 8.054388e+00 1.432804500 4.746231e+00
## R_medPar_delay 7.548182e-01 1.466718e+01 0.452721314 2.901865e+00
## L_aMFG_enc_delay 9.503316e-01 1.610421e+00 0.549479290 1.655631e+00
## L_dlPFC_enc_delay 1.337411e-04 1.048503e-03 0.448037126 9.034739e-02
## L_dMFG_enc_delay 1.232796e+00 2.118649e+00 13.115449973 2.965985e-01
## L_IPS_enc_delay 3.133557e+00 4.745080e+00 0.005486302 6.949513e+00
## L_preSMA_enc_delay 2.663738e-02 3.565503e-01 0.815290534 1.549722e-01
## R_dlPFC_enc_delay 1.121652e-04 2.527492e+00 1.111881389 8.115718e-02
## R_IPS_enc_delay 3.592793e+00 3.932094e+00 0.014352583 1.111219e+00
## R_medPar_enc_delay 6.094053e-01 8.755190e-01 0.436380784 2.007792e+01
## L_aMFG_enc_delay_comb 9.338731e-04 1.255883e-01 0.326366141 8.378011e-01
## L_dlPFC_enc_delay_comb 7.773512e-01 4.483822e-01 0.363823456 3.889196e-01
## L_dMFG_enc_delay_comb 1.344132e+00 1.406865e-03 23.287538758 1.079815e-01
## L_IPS_enc_delay_comb 3.809198e+00 4.618671e+00 0.424499095 6.444259e+00
## L_preSMA_enc_delay_comb 1.372867e+00 1.207308e+00 0.727558988 7.906878e-01
## R_dlPFC_enc_delay_comb 5.733618e-02 4.481425e+00 4.113548089 5.927253e-04
## R_IPS_enc_delay_comb 4.211532e+00 8.238112e-01 0.514702925 1.418843e-01
## R_medPar_enc_delay_comb 9.791675e-02 1.651287e+00 1.216770171 1.406914e+01
## L_DELAY_LE 2.494714e+01 2.679230e-01 2.602743312 2.265574e+00
## L_PROBE_LE 1.866535e+00 3.968173e-01 0.547784215 1.788630e+00
## R_CUE_LE 2.880839e-01 1.493516e+01 6.995797414 1.730693e+01
## R_DELAY_LE 2.373783e+01 4.439149e-01 2.612406526 7.324654e+00
## R_PROBE_LE 2.812990e+00 1.009115e+00 0.024019640 4.335506e-01
## Dim.9 Dim.10 Dim.11 Dim.12
## L_aMFG_enc 4.7403682128 3.435853901 0.645747346 1.258687e+01
## L_aMFG_delay 0.1022843459 0.275358333 4.452872554 7.565803e-01
## L_dlPFC_enc 6.1339999504 0.001158843 4.795202808 1.266258e-01
## L_dlPFC_delay 0.4750592954 0.058596715 4.619899062 5.614144e-01
## L_dMFG_enc 0.6115759785 0.488957205 12.883977103 2.401838e-02
## L_dMFG_delay 8.2619717943 5.383452106 24.885901479 4.646307e-01
## L_IPS_enc 1.8015282819 0.122223628 1.295426406 1.163041e-02
## L_IPS_delay 0.7862711121 0.133244429 5.303409596 6.151801e-01
## L_preSMA_enc 1.4003851037 0.191712331 0.007564968 3.940104e+00
## L_preSMA_delay 0.0942600343 4.455330527 0.113601695 6.849629e+00
## R_dlPFC_enc 5.0519867188 2.015592989 2.943127200 3.239123e-02
## R_dlPFC_delay 0.2780421437 0.336534587 2.515110290 5.714481e+00
## R_IPS_enc 0.3657507564 4.427474332 0.086977127 9.519518e-01
## R_IPS_delay 1.7802120684 1.901354683 0.006643663 7.082740e-01
## R_medPar_enc 0.0073671973 2.814297534 0.079022550 1.460115e-01
## R_medPar_delay 1.5769303534 1.537065589 3.522246932 3.768368e-02
## L_aMFG_enc_delay 7.1619046996 2.168272245 1.600779663 8.728553e+00
## L_dlPFC_enc_delay 4.4762643940 0.073575206 0.150571423 1.037054e-01
## L_dMFG_enc_delay 4.8565707177 9.614501685 2.582926611 7.410024e-01
## L_IPS_enc_delay 0.5184295088 0.768447518 1.398669226 5.876767e-01
## L_preSMA_enc_delay 1.1178755869 5.880622444 0.044029170 2.150467e+01
## R_dlPFC_enc_delay 9.1711846470 1.048735441 0.141530671 4.516625e+00
## R_IPS_enc_delay 0.6156373361 1.047192532 0.211103016 5.853498e-02
## R_medPar_enc_delay 1.7021435003 0.352241279 3.130491863 5.476547e-02
## L_aMFG_enc_delay_comb 7.6332142429 3.645881995 0.047530351 1.390815e+01
## L_dlPFC_enc_delay_comb 6.6712276236 0.026661043 2.254121456 1.663086e-03
## L_dMFG_enc_delay_comb 0.7023760846 4.969722200 1.323711477 3.552705e-01
## L_IPS_enc_delay_comb 1.5025960407 0.438129185 0.021847291 9.700450e-02
## L_preSMA_enc_delay_comb 1.5419930981 2.291628063 0.003340600 1.272759e+01
## R_dlPFC_enc_delay_comb 8.6393182886 1.919118071 1.481916154 1.040484e+00
## R_IPS_enc_delay_comb 0.0003204776 3.516387721 0.181450002 5.616918e-01
## R_medPar_enc_delay_comb 0.4093412217 1.874856079 0.584018209 1.328134e-01
## L_DELAY_LE 1.5050236585 3.965963478 8.009741706 7.716488e-04
## L_PROBE_LE 0.3603784552 0.363208307 2.056479269 5.406233e-01
## R_CUE_LE 4.7721446400 15.904188758 6.070594026 3.560886e-01
## R_DELAY_LE 1.5194951649 4.712187595 0.497032249 5.162128e-04
## R_PROBE_LE 1.6545672660 7.840271421 0.051384787 4.543332e-01
## Dim.13 Dim.14 Dim.15 Dim.16
## L_aMFG_enc 0.004942722 6.111994e-01 0.291138336 2.905130844
## L_aMFG_delay 0.181753977 3.351148e-01 4.950751265 1.767548528
## L_dlPFC_enc 0.015390615 1.033148e+01 2.453533077 0.005971895
## L_dlPFC_delay 6.425510247 8.386005e-04 21.251019408 1.861595358
## L_dMFG_enc 1.524325953 1.930649e-01 0.007267323 1.358969381
## L_dMFG_delay 3.975753296 2.752255e-01 0.172521182 0.026159430
## L_IPS_enc 4.979903324 2.298592e+00 0.823857605 0.033922025
## L_IPS_delay 17.389288373 2.053338e+00 2.671671575 0.379212169
## L_preSMA_enc 3.101910349 3.816055e+00 0.052787959 12.487445744
## L_preSMA_delay 6.937524297 2.110274e+00 2.203359389 24.912742307
## R_dlPFC_enc 4.878489724 1.163335e+00 0.460779110 3.864602622
## R_dlPFC_delay 14.078496871 5.076940e+00 2.082044401 9.574100241
## R_IPS_enc 1.092678372 7.211521e-02 1.362776200 0.385554152
## R_IPS_delay 6.156503926 4.590853e-01 3.272504828 2.115890483
## R_medPar_enc 0.448502986 2.437865e+00 0.013030811 4.198369891
## R_medPar_delay 0.564597503 5.932819e+00 1.386958255 15.133143705
## L_aMFG_enc_delay 0.257080608 6.734976e-02 2.777688648 0.247644894
## L_dlPFC_enc_delay 5.295426102 1.280982e+01 7.184248530 1.960774257
## L_dMFG_enc_delay 0.707483929 1.155771e-02 0.119735913 0.980893750
## L_IPS_enc_delay 3.685161471 9.897426e-02 9.424875883 0.223249731
## L_preSMA_enc_delay 0.173809325 7.752942e-01 1.204344655 0.322789336
## R_dlPFC_enc_delay 1.393031532 1.155976e+01 0.415738729 0.658849605
## R_IPS_enc_delay 2.425905071 1.237197e+00 12.448333226 0.815456690
## R_medPar_enc_delay 0.002523359 7.794713e-01 1.401378711 3.745491697
## L_aMFG_enc_delay_comb 0.102586967 3.680977e-01 0.354366156 1.651974642
## L_dlPFC_enc_delay_comb 1.301360680 1.437740e+01 0.236988180 0.615630974
## L_dMFG_enc_delay_comb 0.051253661 3.734515e-02 0.023562446 1.592558876
## L_IPS_enc_delay_comb 0.238394024 1.291970e+00 4.466231543 0.013423942
## L_preSMA_enc_delay_comb 0.666228469 2.564860e+00 0.188574185 3.157054979
## R_dlPFC_enc_delay_comb 0.468800965 5.966504e+00 0.003706432 0.541712950
## R_IPS_enc_delay_comb 0.018194213 5.566510e-01 6.597777736 0.004716956
## R_medPar_enc_delay_comb 0.155178364 2.613717e-01 0.311079446 0.066520739
## L_DELAY_LE 0.888118260 4.903343e-01 2.050408051 1.374975915
## L_PROBE_LE 3.179673732 4.594869e+00 5.356561963 0.294915270
## R_CUE_LE 1.965964339 4.921954e+00 0.352585656 0.324648545
## R_DELAY_LE 4.677800799 6.132269e-02 0.356659522 0.386063646
## R_PROBE_LE 0.590451596 5.498393e-04 1.269153663 0.010293830
## Dim.17 Dim.18 Dim.19 Dim.20
## L_aMFG_enc 0.819956252 6.712597431 2.771298e+00 0.998570027
## L_aMFG_delay 0.273219736 29.775191172 4.065740e+00 3.853551882
## L_dlPFC_enc 1.284275486 4.502267555 2.619560e-01 0.059085349
## L_dlPFC_delay 11.099093905 7.404365417 2.038516e-01 2.884395109
## L_dMFG_enc 0.134104230 0.292675715 3.820262e-03 0.023880504
## L_dMFG_delay 1.145062169 1.353966336 1.469560e+00 0.014215902
## L_IPS_enc 1.838011228 0.506583016 2.878481e+00 5.116167037
## L_IPS_delay 0.538124090 0.374162829 4.493794e+00 11.833215959
## L_preSMA_enc 1.298700757 0.042171262 2.843870e-02 0.041547328
## L_preSMA_delay 5.621474969 0.381659055 2.472948e+00 1.641858268
## R_dlPFC_enc 0.312594419 0.505476602 3.239705e-01 2.625838621
## R_dlPFC_delay 1.456635766 0.587259284 1.875881e+00 10.988572520
## R_IPS_enc 6.148053628 0.126717662 2.725024e-01 9.149755841
## R_IPS_delay 1.052214499 6.006016181 5.780177e-01 13.016592430
## R_medPar_enc 4.608332345 0.008462164 6.838228e-01 0.100805581
## R_medPar_delay 5.430198824 1.331317870 2.198332e-01 0.393882881
## L_aMFG_enc_delay 2.271162545 7.384452807 5.749725e-02 0.818967758
## L_dlPFC_enc_delay 3.746795099 0.056930874 1.950862e-02 1.855828013
## L_dMFG_enc_delay 0.556289874 0.447806337 1.738432e+00 0.076819149
## L_IPS_enc_delay 6.816998649 2.665622283 5.997146e-02 1.073549088
## L_preSMA_enc_delay 0.773212073 0.635289193 2.652054e+00 0.885688106
## R_dlPFC_enc_delay 3.241133355 2.376054280 4.845638e-01 1.996637779
## R_IPS_enc_delay 3.590798857 5.651312117 5.157355e-02 0.158866669
## R_medPar_enc_delay 0.005381931 1.400098416 2.041656e-01 1.161573191
## L_aMFG_enc_delay_comb 1.864661210 0.002046960 7.337314e-01 0.009232233
## L_dlPFC_enc_delay_comb 0.122736111 1.278020402 1.419456e-01 0.330497238
## L_dMFG_enc_delay_comb 0.050354381 0.005986440 6.569175e-01 0.063993351
## L_IPS_enc_delay_comb 4.729655312 1.617959816 9.509130e-01 0.897181380
## L_preSMA_enc_delay_comb 0.045000619 0.284704030 8.894349e-01 0.135726040
## R_dlPFC_enc_delay_comb 1.654498756 1.535427103 9.844379e-04 0.045045203
## R_IPS_enc_delay_comb 6.581460438 0.998618246 4.705840e-02 2.994843237
## R_medPar_enc_delay_comb 1.714325433 0.326472279 5.770632e-01 0.596728250
## L_DELAY_LE 0.036919945 3.692173297 2.187440e+01 2.484354457
## L_PROBE_LE 0.978943492 0.610263014 1.534934e+01 7.624649298
## R_CUE_LE 16.104849513 1.000994101 5.369060e-02 0.113091628
## R_DELAY_LE 0.289977858 7.795232109 1.465331e+01 3.084229090
## R_PROBE_LE 1.764792245 0.323646347 1.619954e+01 10.850563603
## Dim.21 Dim.22 Dim.23 Dim.24
## L_aMFG_enc 0.144577003 2.491759e+01 0.000000e+00 0.000000e+00
## L_aMFG_delay 0.075646201 1.200522e+01 4.045232e-01 2.984372e-01
## L_dlPFC_enc 0.474113707 1.045908e+00 2.845963e+00 9.993705e+00
## L_dlPFC_delay 0.017973076 7.575030e-01 9.521463e-01 7.563448e+00
## L_dMFG_enc 0.181426138 5.075256e-01 4.313550e+00 1.761025e+00
## L_dMFG_delay 0.111615408 6.516712e-01 3.863739e+00 7.559816e-02
## L_IPS_enc 1.957235908 7.519844e+00 1.289790e+00 7.704619e-01
## L_IPS_delay 9.258832739 1.141002e+00 7.800398e-01 6.043188e+00
## L_preSMA_enc 0.279280837 3.954356e+00 1.759697e+00 1.062290e+01
## L_preSMA_delay 0.374055540 8.970408e-01 2.374211e-01 5.470634e+00
## R_dlPFC_enc 0.466731299 1.190213e+00 1.331297e+01 1.821224e+00
## R_dlPFC_delay 0.455025389 7.506348e-01 3.762975e+00 1.847291e-01
## R_IPS_enc 3.340854305 1.778964e-01 9.778525e-01 2.886026e+00
## R_IPS_delay 12.069030627 3.032558e-02 8.015839e+00 9.008005e-01
## R_medPar_enc 0.227226864 3.801884e-01 2.552165e-01 5.437817e-02
## R_medPar_delay 0.377130643 2.707326e-01 5.745155e-02 1.449692e-02
## L_aMFG_enc_delay 0.474699835 4.992457e+00 1.582767e+00 1.167687e+00
## L_dlPFC_enc_delay 0.414240844 7.395274e+00 2.743363e-01 8.387683e+00
## L_dMFG_enc_delay 0.005884252 7.053250e-01 2.891606e+00 3.482193e+00
## L_IPS_enc_delay 2.886477128 6.823045e-02 4.281912e-01 1.279774e+01
## L_preSMA_enc_delay 0.001468519 9.796320e-02 5.013371e+00 4.988863e+00
## R_dlPFC_enc_delay 0.011654823 6.486993e-01 5.264503e-01 1.041924e-01
## R_IPS_enc_delay 2.989771940 1.392165e-03 1.702240e+01 7.864644e-02
## R_medPar_enc_delay 0.015505696 2.106519e+00 2.204927e-07 3.117851e-04
## L_aMFG_enc_delay_comb 0.365121335 4.711160e+00 1.311019e+00 9.672050e-01
## L_dlPFC_enc_delay_comb 0.558483778 1.183691e+01 8.615898e-01 5.610230e-03
## L_dMFG_enc_delay_comb 0.086202313 1.091623e-02 1.102825e-01 7.428880e+00
## L_IPS_enc_delay_comb 0.003567181 5.680499e+00 4.852669e-02 8.517459e+00
## L_preSMA_enc_delay_comb 0.108658881 1.944313e+00 1.083106e+01 3.918407e-01
## R_dlPFC_enc_delay_comb 0.213982551 2.555272e-02 5.714920e+00 2.078399e+00
## R_IPS_enc_delay_comb 0.076977051 1.336515e-01 1.039293e+01 1.113664e+00
## R_medPar_enc_delay_comb 0.054378258 3.443476e+00 1.613786e-01 2.857113e-02
## L_DELAY_LE 12.899412631 2.530792e-31 4.814825e-31 7.225247e-31
## L_PROBE_LE 14.719639212 1.203706e-31 4.627047e-30 2.023430e-30
## R_CUE_LE 0.053659280 4.814825e-33 1.012317e-30 1.560003e-30
## R_DELAY_LE 17.613953448 1.083336e-30 2.708339e-31 2.330375e-30
## R_PROBE_LE 16.635505357 1.925930e-32 8.137054e-31 4.814825e-33
## Dim.25 Dim.26 Dim.27 Dim.28
## L_aMFG_enc 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## L_aMFG_delay 7.166360e-01 8.910139e-01 6.073838e-02 3.034881e-01
## L_dlPFC_enc 3.806208e+00 1.273468e+00 1.500117e+00 7.174468e+00
## L_dlPFC_delay 8.066024e-05 6.077014e-02 1.621906e-01 3.335568e+00
## L_dMFG_enc 2.305233e-01 6.726809e-02 3.681434e-02 3.008419e-03
## L_dMFG_delay 4.941686e+00 4.579809e-01 3.928729e-01 3.889499e-01
## L_IPS_enc 5.379813e+00 7.804933e-03 1.745099e-01 2.079653e+00
## L_IPS_delay 9.413069e-01 5.468281e-02 7.543571e-01 2.469022e+00
## L_preSMA_enc 7.697369e+00 3.927005e+00 1.461034e+00 7.817103e-01
## L_preSMA_delay 1.873093e+00 1.425819e+00 3.080841e-01 1.035822e+00
## R_dlPFC_enc 1.052322e-03 3.407421e-04 8.334136e+00 8.670113e-02
## R_dlPFC_delay 3.683579e-02 9.533498e-05 5.393603e+00 1.852654e+00
## R_IPS_enc 4.051594e+00 6.056658e-01 1.116070e+01 7.408948e+00
## R_IPS_delay 4.991417e+00 4.595268e+00 4.988216e+00 1.856445e+00
## R_medPar_enc 3.093183e+00 4.858458e-01 2.387531e+01 6.775718e+00
## R_medPar_delay 3.531411e-02 4.515128e-01 1.480371e+01 1.360775e+01
## L_aMFG_enc_delay 2.803962e+00 3.486245e+00 2.376494e-01 1.187449e+00
## L_dlPFC_enc_delay 3.087763e+00 2.303524e+00 6.492829e-02 1.980200e+00
## L_dMFG_enc_delay 1.447126e+01 1.090229e+00 1.965639e+00 1.317025e+00
## L_IPS_enc_delay 1.113415e-02 1.132783e-01 1.395132e+00 2.589006e+00
## L_preSMA_enc_delay 2.915070e-01 1.904942e+01 2.104839e-02 2.085902e+00
## R_dlPFC_enc_delay 1.335176e-01 1.336617e-03 4.807039e+00 6.420336e+00
## R_IPS_enc_delay 5.115662e+00 9.610283e+00 1.415350e+00 3.036904e-02
## R_medPar_enc_delay 1.329882e+00 3.723123e-01 7.486350e+00 1.923286e+01
## L_aMFG_enc_delay_comb 2.322545e+00 2.887687e+00 1.968470e-01 9.835743e-01
## L_dlPFC_enc_delay_comb 1.097285e+01 5.717370e+00 1.627402e+00 8.618490e-01
## L_dMFG_enc_delay_comb 7.991258e+00 4.443522e-01 1.843850e+00 8.647555e-01
## L_IPS_enc_delay_comb 3.609451e+00 7.296358e-02 7.476399e-01 2.527401e-01
## L_preSMA_enc_delay_comb 3.407232e+00 3.480867e+01 7.941199e-01 4.080296e-01
## R_dlPFC_enc_delay_comb 1.006430e-01 2.503249e-03 1.334176e-01 4.585257e+00
## R_IPS_enc_delay_comb 3.825526e-01 5.734929e+00 2.123182e+00 3.830921e+00
## R_medPar_enc_delay_comb 6.172671e+00 3.573775e-04 1.734010e+00 4.209825e+00
## L_DELAY_LE 3.277090e-31 5.484386e-32 4.701977e-32 3.130840e-30
## L_PROBE_LE 3.900008e-31 2.891904e-31 1.647874e-30 1.738152e-30
## R_CUE_LE 1.203706e-31 1.203706e-31 4.478991e-30 2.547042e-30
## R_DELAY_LE 3.009266e-32 3.478711e-31 6.093763e-31 8.902611e-30
## R_PROBE_LE 1.733337e-31 5.825938e-31 3.009266e-32 7.703720e-32
## Dim.29 Dim.30 Dim.31 Dim.32
## L_aMFG_enc 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## L_aMFG_delay 4.435744e-01 7.858842e-01 8.959597e-01 1.859400e-01
## L_dlPFC_enc 1.954221e+01 2.636715e+00 9.655867e-02 1.607464e+00
## L_dlPFC_delay 9.761216e+00 1.256853e-02 1.537981e+00 7.817782e-01
## L_dMFG_enc 9.025525e+00 1.944740e+01 1.285796e+00 1.015507e+00
## L_dMFG_delay 5.154529e+00 2.244502e+00 3.580931e+00 1.091253e+00
## L_IPS_enc 1.384053e+00 9.170298e+00 1.191494e-02 1.542111e+01
## L_IPS_delay 1.520974e+00 5.490685e+00 2.037201e+00 1.474735e+00
## L_preSMA_enc 3.416075e-01 2.439720e+00 6.973070e+00 5.011995e+00
## L_preSMA_delay 6.394295e-01 4.395822e-01 6.338308e+00 1.961774e+00
## R_dlPFC_enc 4.604881e-01 1.369244e+00 1.253289e+00 6.890838e+00
## R_dlPFC_delay 8.835225e-01 7.023752e-01 4.176843e+00 7.928372e-01
## R_IPS_enc 4.008396e+00 7.789292e-01 7.080140e-02 2.689563e+00
## R_IPS_delay 8.412724e-02 1.336861e+00 6.919868e-02 2.130348e-03
## R_medPar_enc 6.446743e-02 1.876001e+00 1.620577e-02 1.461672e+00
## R_medPar_delay 8.375948e-01 1.162606e+00 5.432419e-02 1.084159e+00
## L_aMFG_enc_delay 1.735561e+00 3.074907e+00 3.505596e+00 7.275223e-01
## L_dlPFC_enc_delay 6.508079e+00 1.474067e+00 8.140890e+00 4.998376e-01
## L_dMFG_enc_delay 1.835276e+00 2.400959e+00 6.291471e+00 1.000762e+00
## L_IPS_enc_delay 1.509437e+00 2.973725e+00 6.517631e+00 8.453791e-01
## L_preSMA_enc_delay 1.518343e+00 4.736037e-03 1.005845e+01 1.174450e+00
## R_dlPFC_enc_delay 1.755906e+00 4.576263e-01 1.023511e+01 2.663799e-01
## R_IPS_enc_delay 4.489747e+00 1.687031e+00 4.515973e-01 1.543214e+00
## R_medPar_enc_delay 3.421372e+00 5.874843e-01 2.746722e-01 8.313675e+00
## L_aMFG_enc_delay_comb 1.437580e+00 2.546972e+00 2.903715e+00 6.026129e-01
## L_dlPFC_enc_delay_comb 1.730574e+00 6.358702e+00 8.797607e+00 1.605363e-01
## L_dMFG_enc_delay_comb 2.019292e+00 2.598695e+01 1.353903e+00 1.416074e-04
## L_IPS_enc_delay_comb 1.060189e-01 3.695065e-01 7.030869e+00 1.569347e+01
## L_preSMA_enc_delay_comb 4.759262e-01 1.602739e+00 6.501637e-01 7.472820e-01
## R_dlPFC_enc_delay_comb 4.593310e-01 1.212209e-01 4.299479e+00 7.283939e+00
## R_IPS_enc_delay_comb 1.307646e+01 3.234054e-01 7.374202e-01 6.171262e+00
## R_medPar_enc_delay_comb 3.769383e+00 1.365917e-01 3.530396e-01 1.349677e+01
## L_DELAY_LE 2.437505e-30 3.081488e-31 1.807440e-32 3.510007e-30
## L_PROBE_LE 3.900008e-31 4.814825e-33 4.814825e-33 3.900008e-31
## R_CUE_LE 1.560003e-30 3.254822e-30 5.085659e-30 7.136774e-30
## R_DELAY_LE 3.900008e-31 6.933348e-31 4.701977e-32 1.203706e-33
## R_PROBE_LE 3.510007e-30 3.478711e-31 1.806312e-31 5.403437e-30
## Dim.33 Dim.34 Dim.35 Dim.36
## L_aMFG_enc 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## L_aMFG_delay 4.427554e+00 1.880998e+00 2.321548e-01 3.075290e-01
## L_dlPFC_enc 2.708211e+00 1.036987e-01 2.199278e+00 3.726118e-01
## L_dlPFC_delay 2.026855e+00 1.569075e-01 3.277099e-01 1.146939e+00
## L_dMFG_enc 2.168059e-01 3.554496e+00 5.310239e+00 5.418119e+00
## L_dMFG_delay 3.826961e-03 8.931840e-04 6.130452e+00 5.610559e+00
## L_IPS_enc 3.211794e-03 6.839875e-01 3.365702e+00 6.180774e+00
## L_IPS_delay 2.123239e-01 1.366030e+00 5.902072e+00 2.675705e+00
## L_preSMA_enc 3.036762e+00 3.899460e+00 3.556822e+00 1.337515e-01
## L_preSMA_delay 1.776582e+00 4.242897e-01 3.394650e+00 7.477198e-03
## R_dlPFC_enc 9.302562e-01 1.470640e+01 4.657563e+00 1.515112e+00
## R_dlPFC_delay 1.737174e+00 3.682869e+00 3.972594e-01 4.748050e+00
## R_IPS_enc 1.458469e+00 1.535464e-01 1.596501e+00 1.999135e+01
## R_IPS_delay 2.548810e+00 1.204240e-01 1.457366e+00 3.656490e+00
## R_medPar_enc 1.141213e+01 4.794173e+00 4.660134e-01 1.757839e+00
## R_medPar_delay 4.119013e-01 1.217734e+00 2.874940e-01 9.089988e-02
## L_aMFG_enc_delay 1.732357e+01 7.359727e+00 9.083457e-01 1.203260e+00
## L_dlPFC_enc_delay 2.223300e+00 2.872251e-01 1.750374e-02 2.825268e+00
## L_dMFG_enc_delay 1.220655e-01 3.823562e+00 6.030721e+00 4.952132e+00
## L_IPS_enc_delay 5.677660e-01 7.064158e+00 7.792901e+00 8.419738e-01
## L_preSMA_enc_delay 1.888727e+00 1.034746e-01 5.579926e+00 1.803881e-02
## R_dlPFC_enc_delay 3.409512e+00 2.667911e-01 1.039266e+01 1.140855e+01
## R_IPS_enc_delay 3.248232e+00 8.453655e-01 1.167065e+00 7.868104e-02
## R_medPar_enc_delay 2.935836e+00 1.381320e-02 1.442797e-01 3.408844e-01
## L_aMFG_enc_delay_comb 1.434926e+01 6.096125e+00 7.523904e-01 9.966706e-01
## L_dlPFC_enc_delay_comb 5.423634e-04 5.565893e-02 1.897897e+00 1.160586e+00
## L_dMFG_enc_delay_comb 4.849584e-01 1.075859e+01 1.343891e-02 1.003448e-02
## L_IPS_enc_delay_comb 5.096030e-01 1.097080e+01 1.927426e+00 9.999080e-01
## L_preSMA_enc_delay_comb 2.591972e-02 3.978785e+00 4.482694e-01 1.949225e-01
## R_dlPFC_enc_delay_comb 8.620655e-01 7.533058e+00 2.360768e+01 4.613459e+00
## R_IPS_enc_delay_comb 6.437915e-01 1.431896e+00 3.533604e-03 1.416809e+01
## R_medPar_enc_delay_comb 1.849399e+01 2.665061e+00 3.468765e-02 2.574331e+00
## L_DELAY_LE 1.880791e-33 1.391484e-30 9.103028e-31 7.042105e-31
## L_PROBE_LE 8.137054e-31 1.264644e-31 2.034264e-31 1.880791e-33
## R_CUE_LE 3.081488e-31 3.900008e-31 3.478711e-31 4.190101e-30
## R_DELAY_LE 1.083336e-32 2.437505e-30 2.708339e-31 7.373453e-31
## R_PROBE_LE 2.225653e-30 1.692712e-32 1.194377e-30 7.981325e-31
## Dim.37
## L_aMFG_enc 3.084328e+01
## L_aMFG_delay 8.258684e+00
## L_dlPFC_enc 8.449653e-01
## L_dlPFC_delay 6.119697e-01
## L_dMFG_enc 4.100186e-01
## L_dMFG_delay 5.264705e-01
## L_IPS_enc 6.075113e+00
## L_IPS_delay 9.217897e-01
## L_preSMA_enc 3.194635e+00
## L_preSMA_delay 7.246992e-01
## R_dlPFC_enc 9.615468e-01
## R_dlPFC_delay 6.064210e-01
## R_IPS_enc 1.437185e-01
## R_IPS_delay 2.449935e-02
## R_medPar_enc 3.071457e-01
## R_medPar_delay 2.187188e-01
## L_aMFG_enc_delay 2.972876e-01
## L_dlPFC_enc_delay 5.974476e+00
## L_dMFG_enc_delay 5.698163e-01
## L_IPS_enc_delay 5.512185e-02
## L_preSMA_enc_delay 7.914228e-02
## R_dlPFC_enc_delay 5.240696e-01
## R_IPS_enc_delay 1.124699e-03
## R_medPar_enc_delay 1.701809e+00
## L_aMFG_enc_delay_comb 1.748144e+01
## L_dlPFC_enc_delay_comb 9.562777e+00
## L_dMFG_enc_delay_comb 8.818974e-03
## L_IPS_enc_delay_comb 4.589148e+00
## L_preSMA_enc_delay_comb 1.570767e+00
## R_dlPFC_enc_delay_comb 2.064347e-02
## R_IPS_enc_delay_comb 1.079740e-01
## R_medPar_enc_delay_comb 2.781908e+00
## L_DELAY_LE 1.156762e-30
## L_PROBE_LE 4.814825e-33
## R_CUE_LE 1.925930e-32
## R_DELAY_LE 3.081488e-31
## R_PROBE_LE 0.000000e+00
for (axis in seq.int(1,3)){
print(fviz_contrib(res.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
We’re going to the same thing for the similarity measures, since we have the same issue (though not quite as badly).
We cover 78% of the variance by PC5, so we’ll focus on those.
PC1: Similarity within trial PC2: Individual to template DFR trials - particularly correct trials PC3: High load correct trials in DFR, encoding to correct delay trials in high load correct trials in fusiform PC4: Similarity within high load incorrect trials during delay and encoding PC5: Correlation within low correct and high incorrect trials in DFR mask
res_sim.pca <- prcomp(data_for_reg[,c(39:56)], scale = TRUE)
fviz_eig(res_sim.pca)
summary(res_sim.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.6812 1.5703 1.25092 1.07857 0.99578 0.92715 0.83264
## Proportion of Variance 0.3994 0.1370 0.08693 0.06463 0.05509 0.04776 0.03852
## Cumulative Proportion 0.3994 0.5364 0.62330 0.68793 0.74302 0.79078 0.82929
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.74977 0.70015 0.67719 0.58768 0.57033 0.50182 0.47232
## Proportion of Variance 0.03123 0.02723 0.02548 0.01919 0.01807 0.01399 0.01239
## Cumulative Proportion 0.86052 0.88776 0.91323 0.93242 0.95049 0.96448 0.97688
## PC15 PC16 PC17 PC18
## Standard deviation 0.37334 0.33247 0.30531 0.27034
## Proportion of Variance 0.00774 0.00614 0.00518 0.00406
## Cumulative Proportion 0.98462 0.99076 0.99594 1.00000
res_sim.var <- get_pca_var(res_sim.pca)
#res_sim.var$contrib # Contributions to the PCs
for (axis in seq.int(1,5)){
print(fviz_contrib(res_sim.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res_sim.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
Now, let’s do the same for the MVPA measures. We end up needing 12 PCs to reflect 75% of the variance.
PC1: Fusiform during encoding and probe PC2: DFR during encoding PC3: DFR during probe, averages from template across trial type PC4: Delay period decoding PC5: Fusiform encoding during template – most variance comes from high incorrect trials during individual trials PC6: Individual trials in fusiform, mostly during delay - mostly high incorrect individual trials PC7: Individual trials, low correct and high incorrect, but most variance comes from high incorect trials in fusiform during delay from averages PC8: High incorrect DFR across entire trial from individual trials PC9: DFR decoding during correct trials in encoding and probe PC10: Correct averaged from template trials during delay and probe - mostly DFR during probe PC11: Low correct trials, mostly delay period after largest variance explained from fusiform during encoding PC12: Mostly correct trials from templates in the fusiform - most variance explained by encoding and probe
res_MVPA.pca <- prcomp(data_for_reg[,c(57:92)], scale = TRUE)
fviz_eig(res_MVPA.pca)
summary(res_MVPA.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.1575 1.9266 1.84175 1.68366 1.59799 1.56683 1.21421
## Proportion of Variance 0.1293 0.1031 0.09422 0.07874 0.07093 0.06819 0.04095
## Cumulative Proportion 0.1293 0.2324 0.32663 0.40537 0.47630 0.54450 0.58545
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.14888 1.08102 1.04333 1.00443 0.95488 0.93740 0.92004
## Proportion of Variance 0.03666 0.03246 0.03024 0.02802 0.02533 0.02441 0.02351
## Cumulative Proportion 0.62211 0.65457 0.68481 0.71284 0.73816 0.76257 0.78609
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.84687 0.81679 0.79631 0.75825 0.72385 0.6971 0.68002
## Proportion of Variance 0.01992 0.01853 0.01761 0.01597 0.01455 0.0135 0.01285
## Cumulative Proportion 0.80601 0.82454 0.84215 0.85812 0.87268 0.8862 0.89902
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.6434 0.6236 0.61003 0.59364 0.57402 0.56315 0.50385
## Proportion of Variance 0.0115 0.0108 0.01034 0.00979 0.00915 0.00881 0.00705
## Cumulative Proportion 0.9105 0.9213 0.93166 0.94145 0.95060 0.95941 0.96646
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.4723 0.43151 0.41615 0.37341 0.37014 0.36314 0.34177
## Proportion of Variance 0.0062 0.00517 0.00481 0.00387 0.00381 0.00366 0.00324
## Cumulative Proportion 0.9727 0.97783 0.98264 0.98651 0.99032 0.99398 0.99723
## PC36
## Standard deviation 0.31598
## Proportion of Variance 0.00277
## Cumulative Proportion 1.00000
res_MVPA.var <- get_pca_var(res_MVPA.pca)
#res_MVPA.var$contrib # Contributions to the PCs
for (axis in seq.int(1,12)){
print(fviz_contrib(res_MVPA.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res_MVPA.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
For the structural data, we see that 19 PCs are necessary to explain 75% of the variance. There was a lot of overlap in these, so it was kind of tough to describe/classify them
PC1: Parietal lobe structures and some superior frontal structures PC2: Subcortical structures (including cerebellum) PC3: Cuneus, pericalcarine, superior parietal, lingual PC4: Temporal pole, entorhinal PC5: Lingual, perihippocampal, pericalcarine PC6: corpus callosum, caudate PC7: More subcortical PC8: Frontal pole (and other frontal regions) PC9: corpus callosum, hippocampus and surrounding regions PC10: Cerebellum, frontal PC11: Parahippocampal and lateral occipital PC12: Temporal, entorhinal, parahippocampal PC13: Temporal pole, temporal cortex, amygdala and HPC PC14: pars orbitalis, cingulate, accumbens PC15: left hemisphere subcortical, pars triangularis PC16: Temporal, orbitofrontal , frontal pole PC17: banks sts, some temporal and parietal cortex PC18: Cingulate, frontal PC19: HPC, lateral orbitofrontal poles
res_struct.pca <- prcomp(struc_df[,c(4:37, 40:96)], scale = TRUE)
fviz_eig(res_struct.pca)
summary(res_struct.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 4.808 3.0603 2.45392 1.9873 1.78776 1.6467 1.56606
## Proportion of Variance 0.254 0.1029 0.06617 0.0434 0.03512 0.0298 0.02695
## Cumulative Proportion 0.254 0.3569 0.42311 0.4665 0.50163 0.5314 0.55838
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.49575 1.36542 1.33629 1.29783 1.21173 1.19815 1.16793
## Proportion of Variance 0.02459 0.02049 0.01962 0.01851 0.01614 0.01578 0.01499
## Cumulative Proportion 0.58296 0.60345 0.62307 0.64158 0.65772 0.67349 0.68848
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 1.15174 1.12525 1.06997 1.04709 1.02367 0.98047 0.96024
## Proportion of Variance 0.01458 0.01391 0.01258 0.01205 0.01152 0.01056 0.01013
## Cumulative Proportion 0.70306 0.71698 0.72956 0.74160 0.75312 0.76368 0.77382
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.94282 0.92335 0.90870 0.8998 0.83794 0.82785 0.81884
## Proportion of Variance 0.00977 0.00937 0.00907 0.0089 0.00772 0.00753 0.00737
## Cumulative Proportion 0.78358 0.79295 0.80203 0.8109 0.81864 0.82617 0.83354
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.81073 0.79640 0.77974 0.75909 0.74232 0.72511 0.71232
## Proportion of Variance 0.00722 0.00697 0.00668 0.00633 0.00606 0.00578 0.00558
## Cumulative Proportion 0.84076 0.84773 0.85441 0.86075 0.86680 0.87258 0.87815
## PC36 PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 0.70242 0.69823 0.68353 0.67518 0.66311 0.65209 0.64481
## Proportion of Variance 0.00542 0.00536 0.00513 0.00501 0.00483 0.00467 0.00457
## Cumulative Proportion 0.88358 0.88893 0.89407 0.89908 0.90391 0.90858 0.91315
## PC43 PC44 PC45 PC46 PC47 PC48 PC49
## Standard deviation 0.63751 0.60464 0.60274 0.58650 0.5720 0.56529 0.55961
## Proportion of Variance 0.00447 0.00402 0.00399 0.00378 0.0036 0.00351 0.00344
## Cumulative Proportion 0.91762 0.92163 0.92563 0.92941 0.9330 0.93651 0.93996
## PC50 PC51 PC52 PC53 PC54 PC55 PC56
## Standard deviation 0.55386 0.54381 0.53336 0.51437 0.50853 0.49775 0.48243
## Proportion of Variance 0.00337 0.00325 0.00313 0.00291 0.00284 0.00272 0.00256
## Cumulative Proportion 0.94333 0.94658 0.94970 0.95261 0.95545 0.95817 0.96073
## PC57 PC58 PC59 PC60 PC61 PC62 PC63
## Standard deviation 0.46643 0.45147 0.44692 0.43848 0.43119 0.41933 0.40997
## Proportion of Variance 0.00239 0.00224 0.00219 0.00211 0.00204 0.00193 0.00185
## Cumulative Proportion 0.96312 0.96536 0.96756 0.96967 0.97171 0.97365 0.97549
## PC64 PC65 PC66 PC67 PC68 PC69 PC70
## Standard deviation 0.40208 0.38511 0.38452 0.37470 0.36597 0.35508 0.34540
## Proportion of Variance 0.00178 0.00163 0.00162 0.00154 0.00147 0.00139 0.00131
## Cumulative Proportion 0.97727 0.97890 0.98052 0.98207 0.98354 0.98492 0.98623
## PC71 PC72 PC73 PC74 PC75 PC76 PC77
## Standard deviation 0.33971 0.32808 0.31296 0.31074 0.29668 0.28850 0.27115
## Proportion of Variance 0.00127 0.00118 0.00108 0.00106 0.00097 0.00091 0.00081
## Cumulative Proportion 0.98750 0.98869 0.98976 0.99082 0.99179 0.99270 0.99351
## PC78 PC79 PC80 PC81 PC82 PC83 PC84
## Standard deviation 0.26410 0.2532 0.24542 0.23750 0.23084 0.22053 0.20787
## Proportion of Variance 0.00077 0.0007 0.00066 0.00062 0.00059 0.00053 0.00047
## Cumulative Proportion 0.99428 0.9950 0.99565 0.99627 0.99685 0.99739 0.99786
## PC85 PC86 PC87 PC88 PC89 PC90 PC91
## Standard deviation 0.20240 0.19391 0.17881 0.15860 0.15075 0.14293 0.12598
## Proportion of Variance 0.00045 0.00041 0.00035 0.00028 0.00025 0.00022 0.00017
## Cumulative Proportion 0.99831 0.99872 0.99907 0.99935 0.99960 0.99983 1.00000
res_struct.var <- get_pca_var(res_struct.pca)
for (axis in seq.int(1,19)){
print(fviz_contrib(res_struct.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res_struct.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
For the FA data, we need 5 PCs to explain 75% of the variance:
PC1: Inferior frontal occipital fasiculus, forceps, anterior thalamic radiation (but really captures a lot of variance from a lot of structures) PC2: Cingulum/hippocampus PC3: cingulum PC4: Superior longitudinal fasiculus in temporal lobe in the left hemisphere and cortcospinal tract PC5: Superior longitudinal fasiculus in temporal lobe in the right hemisphere and cingulum
res_FA.pca <- prcomp(FA_Data[,2:21], scale = TRUE)
fviz_eig(res_FA.pca)
summary(res_FA.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.2423 1.24533 1.2141 1.02058 1.00489 0.86869 0.82390
## Proportion of Variance 0.5256 0.07754 0.0737 0.05208 0.05049 0.03773 0.03394
## Cumulative Proportion 0.5256 0.60316 0.6769 0.72894 0.77943 0.81716 0.85111
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.72827 0.64431 0.56960 0.53522 0.49422 0.47438 0.45150
## Proportion of Variance 0.02652 0.02076 0.01622 0.01432 0.01221 0.01125 0.01019
## Cumulative Proportion 0.87762 0.89838 0.91460 0.92893 0.94114 0.95239 0.96258
## PC15 PC16 PC17 PC18 PC19 PC20
## Standard deviation 0.41569 0.38081 0.36814 0.33655 0.31425 0.28804
## Proportion of Variance 0.00864 0.00725 0.00678 0.00566 0.00494 0.00415
## Cumulative Proportion 0.97122 0.97847 0.98525 0.99091 0.99585 1.00000
res_FA.var <- get_pca_var(res_FA.pca)
for (axis in seq.int(1,5)){
print(fviz_contrib(res_FA.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res_FA.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
For the EEG data, we see that 11 PCs are necessary to explain 75% of the variance.
PC1: Alpha and beta at high load during probe PC2: ERSPs at probe during low load trials; specifically, low gamma
PC3: Alpha and beta power at delay, cue P3 PC4: Low gamma PC5: Theta power, n170 amplitude PC6: Theta power during cue, probe P3 PC7: CDA across lower load conditions (mostly 1 dot, some 3 dots), P3 PC8: CDA across higher load conditions (mostly 5 dot, some 3 dots), n170 at high load PC9: P3 component PC10: CDA - 5 dot condition has the most variance, really a mix PC11: CDA - 1 dot condition and also 5 dot condition (but high load has less contributions)
res_EEG.pca <- prcomp(EEG_df[complete.cases(EEG_df),2:66], scale = TRUE)
fviz_eig(res_EEG.pca)
summary(res_EEG.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.5923 2.6623 2.28645 2.23038 1.97288 1.79846 1.70510
## Proportion of Variance 0.1985 0.1090 0.08043 0.07653 0.05988 0.04976 0.04473
## Cumulative Proportion 0.1985 0.3076 0.38800 0.46454 0.52442 0.57418 0.61891
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.62484 1.50068 1.41758 1.31305 1.24195 1.20095 1.13235
## Proportion of Variance 0.04062 0.03465 0.03092 0.02652 0.02373 0.02219 0.01973
## Cumulative Proportion 0.65952 0.69417 0.72509 0.75161 0.77534 0.79753 0.81726
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 1.03640 1.00283 0.99154 0.9505 0.91791 0.8569 0.7476
## Proportion of Variance 0.01653 0.01547 0.01513 0.0139 0.01296 0.0113 0.0086
## Cumulative Proportion 0.83378 0.84925 0.86438 0.8783 0.89124 0.9025 0.9111
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.69197 0.69084 0.65622 0.63315 0.61055 0.59280 0.55211
## Proportion of Variance 0.00737 0.00734 0.00662 0.00617 0.00574 0.00541 0.00469
## Cumulative Proportion 0.91850 0.92584 0.93247 0.93864 0.94437 0.94978 0.95447
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.51556 0.51158 0.49653 0.46572 0.44426 0.41169 0.40618
## Proportion of Variance 0.00409 0.00403 0.00379 0.00334 0.00304 0.00261 0.00254
## Cumulative Proportion 0.95856 0.96258 0.96638 0.96971 0.97275 0.97536 0.97789
## PC36 PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 0.3863 0.38145 0.35916 0.33437 0.30273 0.29564 0.29304
## Proportion of Variance 0.0023 0.00224 0.00198 0.00172 0.00141 0.00134 0.00132
## Cumulative Proportion 0.9802 0.98243 0.98441 0.98613 0.98754 0.98889 0.99021
## PC43 PC44 PC45 PC46 PC47 PC48 PC49
## Standard deviation 0.28214 0.25737 0.24047 0.23730 0.23107 0.22168 0.20836
## Proportion of Variance 0.00122 0.00102 0.00089 0.00087 0.00082 0.00076 0.00067
## Cumulative Proportion 0.99143 0.99245 0.99334 0.99421 0.99503 0.99579 0.99645
## PC50 PC51 PC52 PC53 PC54 PC55 PC56
## Standard deviation 0.19864 0.1804 0.17544 0.15914 0.14127 0.13091 0.12498
## Proportion of Variance 0.00061 0.0005 0.00047 0.00039 0.00031 0.00026 0.00024
## Cumulative Proportion 0.99706 0.9976 0.99804 0.99843 0.99873 0.99900 0.99924
## PC57 PC58 PC59 PC60 PC61 PC62 PC63
## Standard deviation 0.1147 0.10203 0.09564 0.09081 0.07391 0.05689 1.535e-07
## Proportion of Variance 0.0002 0.00016 0.00014 0.00013 0.00008 0.00005 0.000e+00
## Cumulative Proportion 0.9994 0.99960 0.99974 0.99987 0.99995 1.00000 1.000e+00
## PC64 PC65
## Standard deviation 1.484e-07 1.085e-07
## Proportion of Variance 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00
res_EEG.var <- get_pca_var(res_EEG.pca)
for (axis in seq.int(1,11)){
print(fviz_contrib(res_EEG.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res_EEG.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
For the resting state data, we see that 8 PCs are necessary to explain 75% of the variance.
PC1: Participation coefficients across network (both average and individual network) PC2: FPCN connectivity with other networks, within connectivity for DMN and DAN PC3: Within connectivity for DMN, FPCN connectivity with visual, CO, DMN, DANA networks PC4: Global efficiency, within CO connectivity, participation coefficients PC5: Within visual connectivity and participation coefficient, within CO connectivity PC6: Louvain modularity, within DAN connectivity PC7: Within network VAN connectivity PC8: Global efficiency, FPCN connectivity with DMN, FPCN and DAN, VAN intranetwork connectivity
res_RS.pca <- prcomp(RS_df[complete.cases(RS_df),2:21], scale = TRUE)
fviz_eig(res_RS.pca)
summary(res_RS.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.1445 2.0285 1.29812 1.13191 1.04771 0.98921 0.94066
## Proportion of Variance 0.2299 0.2057 0.08426 0.06406 0.05489 0.04893 0.04424
## Cumulative Proportion 0.2299 0.4357 0.51995 0.58401 0.63890 0.68783 0.73207
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.90541 0.84708 0.8087 0.76047 0.75206 0.68202 0.60154
## Proportion of Variance 0.04099 0.03588 0.0327 0.02892 0.02828 0.02326 0.01809
## Cumulative Proportion 0.77306 0.80893 0.8416 0.87055 0.89883 0.92209 0.94018
## PC15 PC16 PC17 PC18 PC19 PC20
## Standard deviation 0.5745 0.50112 0.46856 0.41335 0.39669 0.25988
## Proportion of Variance 0.0165 0.01256 0.01098 0.00854 0.00787 0.00338
## Cumulative Proportion 0.9567 0.96923 0.98021 0.98875 0.99662 1.00000
res_RS.var <- get_pca_var(res_RS.pca)
for (axis in seq.int(1,8)){
print(fviz_contrib(res_RS.pca, choice = "var", axes = axis, top = 10))
}
fviz_pca_var(res_RS.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
Now that we have our PCs to reduce overlapping variance, we can use these in a series of regressions to predict span, BPRS and accuracy at high load. We will use a number of different methods to select features: stepwise regression, ridge regression, LASSO regression and ElasticNet regression. A brief description of the methods are as follows:
Stepwise Regression: Initially takes all parameters and sequentially removes parameters to minimize the AIC of the model overall. At each step, it calculates what the AIC would be if it removed each individual parameter and then removes the one that makes the most difference. Stops when model converges at a minimal AIC.
Ridge Regression: A type of penalized linear regression (uses L2 norm - minimizes the sum of the squared coefficients). Shrinks the values of unimportant coefficients close to zero, but does not remove them.
LASSO Regression: Another type of penalized linear regression (uses L1 norm - minimizes the sum of the absolute value of the coefficients). Forces coefficients that are unimportant to the model to be zero, creating a sparse model.
ElasticNet Regression: Uses both L1- and L2-norm penalties with regression, so shrinks some coefficients close to zero and some to exactly zero.
Each model is performed in a 10 fold cross validation, with inner hyperparameter tuning. There are 10% of subjects held out over each of the CV loops, so R squared is calculated using predictions on the held out data and from the best model for each cross validation, meaning that every subject gets a prediction.
When listing the measures that occur in each model after feature selection, the features listed are those that show up in 7 or more cross validation iterations.
Across the three potential feature selection procedures, a different model explains the most variance for each behavioral measure. However, across all three, ElasticNet performs consistently the best across the measures (2nd best [roughly the same] for span, second best for BPRS and best for high load accuracy), so we will use that as the model to compare across behavioral measure.
reg_data <- data.frame(PTID = constructs_fMRI$PTID,
span = data_for_reg$span,
high_acc = data_for_reg$high_acc,
BPRS = data_for_reg$BPRS)
univ_PCs <- data.frame(PTID = constructs_fMRI$PTID[!is.na(data_for_reg$L_CUE_LE)], res.pca[["x"]][,1:3])
colnames(univ_PCs)[2:4] <- paste("PC",c(1:3),"_univ",sep="")
sim_PCs <- data.frame(PTID = constructs_fMRI$PTID, res_sim.pca[["x"]][,1:5])
colnames(sim_PCs)[2:6] <- paste("PC",c(1:5),"_sim",sep="")
MVPA_PCs <- data.frame(PTID = constructs_fMRI$PTID, res_MVPA.pca[["x"]][,1:12])
colnames(MVPA_PCs)[2:13] <- paste("PC",c(1:12),"_MVPA",sep="")
EEG_PCs <- data.frame(PTID = EEG_df$PTID[complete.cases(EEG_df)], res_EEG.pca[["x"]][,1:11])
colnames(EEG_PCs)[2:12] <- paste("PC",c(1:11),"_EEG",sep="")
RS_PCs <- data.frame(PTID = RS_df$PTID, res_RS.pca[["x"]][,1:8])
colnames(RS_PCs)[2:9] <- paste("PC",c(1:8),"_RS",sep="")
struc_PCs <- data.frame(PTID=struc_df$ID, res_struct.pca[["x"]][,1:19])
colnames(struc_PCs)[2:20] <- paste("PC",c(1:19),"_struc",sep="")
FA_PCs <- data.frame(PTID = FA_Data$ID, res_FA.pca[["x"]][,1:5])
colnames(FA_PCs)[2:6] <- paste("PC", c(1:5), "_FA", sep="")
reg_data <- Reduce(function(x,y) merge(x=x, y=y, by = "PTID", all.x = TRUE),
list(reg_data, univ_PCs, sim_PCs, MVPA_PCs, EEG_PCs, RS_PCs, struc_PCs, FA_PCs))
#add scanner, age, gender (won't include CS/NCS in regression) and dummy code
reg_data <- merge(reg_data, p200_demographics, by = "PTID")
reg_data$SCANNER <- reg_data$SCANNER - 1
reg_data$GENDER <- reg_data$GENDER - 1
reg_data <- reg_data[complete.cases(reg_data),1:70]
set.seed(123)
lambda <- 10^seq(-3, 3, length = 100)
# save(list = c("univ_PCs", "sim_PCs", "MVPA_PCs", "EEG_PCs", "RS_PCs", "struc_PCs", "FA_PCs"), file = "~/Documents/Code/RDoC_for_GitHub/data/PCs.RData")
While the final stepwise model is significant overall, we see significant effects of:
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_step_span <- list()
RMSE_cv_step_span <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(2, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(2, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
step.span <- train(span ~., data = train_data,
method = "lmStepAIC",
trControl = trainControl("cv", number = 10),
trace = FALSE
)
best_models_step_span[[fold]] <- step.span$finalModel
preds <- step.span %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_step_span <- data.frame(true = test_data$span, pred = preds)
}else{
cv_preds_step_span <- rbind(cv_preds_step_span, data.frame(true = test_data$span, pred = preds))}
RMSE_cv_step_span[fold] <- RMSE(preds, test_data$span)
}
model_comparison_template <- list(univ = data.frame(matrix(nrow=10, ncol=3)),
sim = data.frame(matrix(nrow=10, ncol=5)),
MVPA = data.frame(matrix(nrow=10, ncol=12)),
EEG = data.frame(matrix(nrow=10, ncol=11)),
RS = data.frame(matrix(nrow=10, ncol=8)),
FA = data.frame(matrix(nrow=10, ncol=5)),
struc = data.frame(matrix(nrow=10, ncol=19)),
demo = data.frame(matrix(nrow=10, ncol=3)))
colnames(model_comparison_template[["univ"]]) <- paste("PC",c(1:3),"_univ",sep="")
colnames(model_comparison_template[["sim"]]) <- paste("PC",c(1:5),"_sim",sep="")
colnames(model_comparison_template[["MVPA"]]) <- paste("PC",c(1:12),"_MVPA",sep="")
colnames(model_comparison_template[["EEG"]]) <- paste("PC",c(1:11),"_EEG",sep="")
colnames(model_comparison_template[["RS"]]) <- paste("PC",c(1:8),"_RS",sep="")
colnames(model_comparison_template[["struc"]]) <- paste("PC",c(1:19),"_struc",sep="")
colnames(model_comparison_template[["FA"]]) <- paste("PC", c(1:5), "_FA", sep="")
colnames(model_comparison_template[["demo"]]) <- c("GENDER", "AGE", "SCANNER")
step_span_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- rownames(data.frame(best_models_step_span[[fold]]$coefficients))
split_coef <- strsplit(coefs, split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
step_span_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
if ("GENDER" %in% coefs){
step_span_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% coefs){
step_span_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% coefs){
step_span_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
compare_span <- data.frame(matrix(nrow=2, ncol=4))
colnames(compare_span) <- c("stepwise", "ridge", "lasso", "elastic")
rownames(compare_span) <- c("RMSE", "Rsquared")
compare_span$stepwise[1] <- rowMeans(RMSE_cv_step_span)
compare_span$stepwise[2] <- cor(cv_preds_step_span$true, cv_preds_step_span$pred)^2
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_ridge_span <- list()
RMSE_cv_ridge_span <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(2, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(2, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
ridge.span <- train(
span ~., data = train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0, lambda = lambda)
)
best_models_ridge_span[[fold]] <- coef(ridge.span$finalModel, ridge.span$bestTune$lambda)
preds <- ridge.span %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_ridge_span <- data.frame(true = test_data$span, pred = preds)
}else{
cv_preds_ridge_span <- rbind(cv_preds_ridge_span, data.frame(true = test_data$span, pred = preds))}
RMSE_cv_ridge_span[fold] <- RMSE(preds, test_data$span)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_span$ridge[1] <- rowMeans(RMSE_cv_ridge_span)
compare_span$ridge[2] <- cor(cv_preds_ridge_span$true, cv_preds_ridge_span$pred)^2
In the LASSO regression, we see:
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_lasso_span <- list()
RMSE_cv_lasso_span <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(2, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(2, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
lasso.span <- train(
span ~., data =train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 1, lambda = lambda)
)
best_models_lasso_span[[fold]] <- coef(lasso.span$finalModel, lasso.span$bestTune$lambda)
preds <- lasso.span %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_lasso_span <- data.frame(true = test_data$span, pred = preds)
}else{
cv_preds_lasso_span <- rbind(cv_preds_lasso_span, data.frame(true = test_data$span, pred = preds))}
RMSE_cv_lasso_span[fold] <- RMSE(preds, test_data$span)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_span$lasso[1] <- rowMeans(RMSE_cv_lasso_span)
compare_span$lasso[2] <- cor(cv_preds_lasso_span$true, cv_preds_lasso_span$pred)^2
lasso_span_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- rownames(best_models_lasso_span[[fold]])
split_coef <- strsplit(coefs, split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
if (best_models_lasso_span[[fold]][row]!= 0){
lasso_span_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
}
if ("GENDER" %in% coefs){
lasso_span_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% coefs){
lasso_span_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% coefs){
lasso_span_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
It seems as though these are particularly weighted towards encoding and regions associated with visual processing. Structural seems to show some HPC involvement. While accuracy seems to be only weighted towards visual components, predicting span also includes measures at low load, from the delay period, and all modalities.
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_elastic_span <- list()
RMSE_cv_elastic_span <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(2, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(2, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
elastic.span <- train(
span ~., data = train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneLength = 10
)
best_models_elastic_span[[fold]] <- elastic.span$finalModel
preds <- elastic.span %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_elastic_span <- data.frame(true = test_data$span, pred = preds)
}else{
cv_preds_elastic_span <- rbind(cv_preds_elastic_span, data.frame(true = test_data$span, pred = preds))}
RMSE_cv_elastic_span[fold] <- RMSE(preds, test_data$span)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_span$elastic[1] <- rowMeans(RMSE_cv_elastic_span)
compare_span$elastic[2] <- cor(cv_preds_elastic_span$true, cv_preds_elastic_span$pred)^2
elastic_span_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- coef(best_models_elastic_span[[fold]], s = best_models_elastic_span[[fold]]$tuneValue$lambda)
split_coef <- strsplit(rownames(coefs), split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
if (coefs[row]!= 0){
elastic_span_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
}
if ("GENDER" %in% rownames(coefs)){
elastic_span_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% rownames(coefs)){
elastic_span_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% rownames(coefs)){
elastic_span_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
When comparing models, LASSO is our best model that contains feature selection, explaining ~9% of the variance in span. Ridge regression does slightly better, explaining 11% of the variance.
compare_span
## stepwise ridge lasso elastic
## RMSE 1.229236514 0.9107650 0.9069912 0.93027114
## Rsquared 0.006068862 0.1053506 0.1215684 0.08044747
To predict BPRS, we see the PCs that contain important information are:
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_step_BPRS <- list()
RMSE_cv_step_BPRS <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(4, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(4, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
step.BPRS <- train(BPRS ~., data = train_data,
method = "lmStepAIC",
trControl = trainControl("cv", number = 10),
trace = FALSE
)
best_models_step_BPRS[[fold]] <- step.BPRS$finalModel
preds <- step.BPRS %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_step_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
}else{
cv_preds_step_BPRS <- rbind(cv_preds_step_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
RMSE_cv_step_BPRS[fold] <- RMSE(preds, test_data$BPRS)
}
step_BPRS_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- rownames(data.frame(best_models_step_BPRS[[fold]]$coefficients))
split_coef <- strsplit(coefs, split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
step_BPRS_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
if ("GENDER" %in% coefs){
step_BPRS_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% coefs){
step_BPRS_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% coefs){
step_BPRS_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
compare_BPRS <- data.frame(matrix(nrow=2, ncol=4))
colnames(compare_BPRS) <- c("stepwise", "ridge", "lasso", "elastic")
rownames(compare_BPRS) <- c("RMSE", "Rsquared")
compare_BPRS$stepwise[1] <- rowMeans(RMSE_cv_step_BPRS)
compare_BPRS$stepwise[2] <- cor(cv_preds_step_BPRS$true, cv_preds_step_BPRS$pred)^2
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_ridge_BPRS <- list()
RMSE_cv_ridge_BPRS <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(4, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(4, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
ridge.BPRS <- train(
BPRS ~., data = train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0, lambda = lambda)
)
best_models_ridge_BPRS[[fold]] <- coef(ridge.BPRS$finalModel, ridge.BPRS$bestTune$lambda)
preds <- ridge.BPRS %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_ridge_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
}else{
cv_preds_ridge_BPRS <- rbind(cv_preds_ridge_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
RMSE_cv_ridge_BPRS[fold] <- RMSE(preds, test_data$BPRS)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_BPRS$ridge[1] <- rowMeans(RMSE_cv_ridge_BPRS)
compare_BPRS$ridge[2] <- cor(cv_preds_ridge_BPRS$true, cv_preds_ridge_BPRS$pred)^2
LASSO gives us almost a combination of the above analyses:
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_lasso_BPRS <- list()
RMSE_cv_lasso_BPRS <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(4, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(4, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
lasso.BPRS <- train(
BPRS ~., data =train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 1, lambda = lambda)
)
best_models_lasso_BPRS[[fold]] <- coef(lasso.BPRS$finalModel, lasso.BPRS$bestTune$lambda)
preds <- lasso.BPRS %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_lasso_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
}else{
cv_preds_lasso_BPRS <- rbind(cv_preds_lasso_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
RMSE_cv_lasso_BPRS[fold] <- RMSE(preds, test_data$BPRS)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_BPRS$lasso[1] <- rowMeans(RMSE_cv_lasso_BPRS)
compare_BPRS$lasso[2] <- cor(cv_preds_lasso_BPRS$true, cv_preds_lasso_BPRS$pred)^2
lasso_BPRS_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- rownames(best_models_lasso_BPRS[[fold]])
split_coef <- strsplit(coefs, split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
if (best_models_lasso_BPRS[[fold]][row]!= 0){
lasso_BPRS_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
}
if ("GENDER" %in% coefs){
lasso_BPRS_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% coefs){
lasso_BPRS_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% coefs){
lasso_BPRS_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_elastic_BPRS <- list()
RMSE_cv_elastic_BPRS <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(4, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(4, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
elastic.BPRS <- train(
BPRS ~., data = train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneLength = 10
)
best_models_elastic_BPRS[[fold]] <- elastic.BPRS$finalModel
preds <- elastic.BPRS %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_elastic_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
}else{
cv_preds_elastic_BPRS <- rbind(cv_preds_elastic_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
RMSE_cv_elastic_BPRS[fold] <- RMSE(preds, test_data$BPRS)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_BPRS$elastic[1] <- rowMeans(RMSE_cv_elastic_BPRS)
compare_BPRS$elastic[2] <- cor(cv_preds_elastic_BPRS$true, cv_preds_elastic_BPRS$pred)^2
elastic_BPRS_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- coef(best_models_elastic_BPRS[[fold]], s = best_models_elastic_BPRS[[fold]]$tuneValue$lambda)
split_coef <- strsplit(rownames(coefs), split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
if (coefs[row]!= 0){
elastic_BPRS_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
}
if ("GENDER" %in% rownames(coefs)){
elastic_BPRS_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% rownames(coefs)){
elastic_BPRS_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% rownames(coefs)){
elastic_BPRS_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
Here, stepwise regression is the best model that performs feature selection - it explains 9% of the variance in BPRS, although it does have the highest RMSE. Again, Ridge explains the most variance (10%).
compare_BPRS
## stepwise ridge lasso elastic
## RMSE 1.06439845 0.93569897 0.95144131 0.97678297
## Rsquared 0.09644658 0.06325695 0.03558682 0.01704404
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_step_high_acc <- list()
RMSE_cv_step_high_acc <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(3, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(3, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
step.high_acc <- train(high_acc ~., data = train_data,
method = "lmStepAIC",
trControl = trainControl("cv", number = 10),
trace = FALSE
)
best_models_step_high_acc[[fold]] <- step.high_acc$finalModel
preds <- step.high_acc %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_step_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
}else{
cv_preds_step_high_acc <- rbind(cv_preds_step_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
RMSE_cv_step_high_acc[fold] <- RMSE(preds, test_data$high_acc)
}
step_high_acc_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- rownames(data.frame(best_models_step_high_acc[[fold]]$coefficients))
split_coef <- strsplit(coefs, split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
step_high_acc_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
if ("GENDER" %in% coefs){
step_high_acc_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% coefs){
step_high_acc_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% coefs){
step_high_acc_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
compare_high_acc <- data.frame(matrix(nrow=2, ncol=4))
colnames(compare_high_acc) <- c("stepwise", "ridge", "lasso", "elastic")
rownames(compare_high_acc) <- c("RMSE", "Rsquared")
compare_high_acc$stepwise[1] <- rowMeans(RMSE_cv_step_high_acc)
compare_high_acc$stepwise[2] <- cor(cv_preds_step_high_acc$true, cv_preds_step_high_acc$pred)^2
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_ridge_high_acc <- list()
RMSE_cv_high_acc <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(3, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(3, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
ridge.high_acc <- train(
high_acc ~., data = train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0, lambda = lambda)
)
best_models_ridge_high_acc[[fold]] <- coef(ridge.high_acc$finalModel, ridge.high_acc$bestTune$lambda)
preds <- ridge.high_acc %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
}else{
cv_preds_high_acc <- rbind(cv_preds_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
RMSE_cv_high_acc[fold] <- RMSE(preds, test_data$high_acc)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_high_acc$ridge[1] <- rowMeans(RMSE_cv_high_acc)
compare_high_acc$ridge[2] <- cor(cv_preds_high_acc$true, cv_preds_high_acc$pred)^2
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_lasso_high_acc <- list()
RMSE_cv_lasso_high_acc <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(3, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(3, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
lasso.high_acc <- train(
high_acc ~., data =train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 1, lambda = lambda)
)
best_models_lasso_high_acc[[fold]] <- coef(lasso.high_acc$finalModel, lasso.high_acc$bestTune$lambda)
preds <- lasso.high_acc %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_lasso_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
}else{
cv_preds_lasso_high_acc <- rbind(cv_preds_lasso_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
RMSE_cv_lasso_high_acc[fold] <- RMSE(preds, test_data$high_acc)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_high_acc$lasso[1] <- rowMeans(RMSE_cv_lasso_high_acc)
compare_high_acc$lasso[2] <- cor(cv_preds_lasso_high_acc$true, cv_preds_lasso_high_acc$pred)^2
lasso_high_acc_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- rownames(best_models_lasso_high_acc[[fold]])
split_coef <- strsplit(coefs, split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
if (best_models_lasso_high_acc[[fold]][row]!= 0){
lasso_high_acc_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
}
if ("GENDER" %in% coefs){
lasso_high_acc_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% coefs){
lasso_high_acc_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% coefs){
lasso_high_acc_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
When we look at ElasticNet, all coefficients play a role. The most important ones seem to be:
# set up outer cross validation
split <- vfold_cv(reg_data,v=10)
best_models_elastic_high_acc <- list()
RMSE_cv_elastic_high_acc <- data.frame(matrix(ncol=10, nrow=1))
for (fold in seq.int(1,10)){
# set up training and test set for fold
train_data <- analysis(split$splits[[fold]])
train_data <- train_data[,c(3, 5:70)]
test_data <- assessment(split$splits[[fold]])
test_data <- test_data[,c(3, 5:70)]
train_data[,1:67] <- sapply(train_data[,1:67],scale)
train_data <- data.frame(train_data)
test_data[,1:67] <- sapply(test_data[,1:67], scale)
test_data <- data.frame(test_data)
elastic.high_acc <- train(
high_acc ~., data = train_data, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneLength = 10
)
best_models_elastic_high_acc[[fold]] <- elastic.high_acc$finalModel
preds <- elastic.high_acc %>% predict(test_data)
# Make predictions
if (fold == 1){
cv_preds_elastic_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
}else{
cv_preds_elastic_high_acc <- rbind(cv_preds_elastic_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
RMSE_cv_elastic_high_acc[fold] <- RMSE(preds, test_data$high_acc)
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_high_acc$elastic[1] <- rowMeans(RMSE_cv_elastic_high_acc)
compare_high_acc$elastic[2] <- cor(cv_preds_elastic_high_acc$true, cv_preds_elastic_high_acc$pred)^2
elastic_high_acc_best_coeffs <- model_comparison_template
for (fold in seq.int(1,10)){
coefs <- coef(best_models_elastic_high_acc[[fold]], s = best_models_elastic_high_acc[[fold]]$tuneValue$lambda)
split_coef <- strsplit(rownames(coefs), split="_")
for (row in seq.int(2,length(coefs))){
if (length(split_coef[[row]]) > 1){
PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
if (coefs[row]!= 0){
elastic_high_acc_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
}
}
}
if ("GENDER" %in% rownames(coefs)){
elastic_high_acc_best_coeffs[["demo"]]$GENDER <- "x"
}
if ("AGE" %in% rownames(coefs)){
elastic_high_acc_best_coeffs[["demo"]]$AGE <- "x"
}
if ("SCANNER" %in% rownames(coefs)){
elastic_high_acc_best_coeffs[["demo"]]$SCANNER <- "x"
}
}
Here, ElasticNet is our best model with feature selection, explaining 12% of variance in accuracy at high load. Just as in the previous behavioral measures, Ridge regression performs best, explaining 14% of the variance.
compare_high_acc
## stepwise ridge lasso elastic
## RMSE 1.03781127 0.9009103 0.9082412 0.9043657
## Rsquared 0.08459256 0.1321430 0.1168125 0.1250408